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Abstract. 

We present a new code allowing to evolve three-dimensional self- 
gravitating collisionless systems with a large number of particles N > 10 7 . 
FLY (Fast Level-based N-bodY code) is a fully parallel code based on a 
tree algorithm. It adopts periodic boundary conditions implemented by 
means of the Ewald summation technique. FLY is based on the one-side 
communication paradigm to share data among the processors that access 
remote private data avoiding any kind of synchronism. The code was 
originally developed on a CRAY T3E system using the SHMEM library 
and it was ported on SGI ORIGIN 2000 and on IBM SP (on the latter 
making use of the LAPI library). [FLY] 1 ! version 1.1 is an open source 
freely available code. 

FLY data output can be analysed with AstroMD, an analysis and vi- 
sualization tool specifically designed to deal with the visualization and 
analysis of astrophysical data. AstroMD can manage different physi- 
cal quantities. It can find out structures without well defined shape or 
symmetries, and perform quantitative calculations on selected regions. 



AstroMD * is a freely available code. 



1. Introduction 

FLY is the tree N-Body code we design, develop and use to run very big simu- 
lations of the Large Scale Structure of the universe using parallel systems MPP 
and SMP. FLY uses the Leapfrog numerical integration scheme for performance 
reasons, and incorporates fully periodic boundary conditions using the Ewald 
method. The I/O data format is integrated with the AstroMD package. 
AstroMD is an analysis and visualization tool specifically designed to deal with 
the visualization and analysis of astrophysical data. AstroMD can find struc- 
tures having a not well defined shape or symmetries, and performs quantitative 
calculations on a selected region or structure. AstroMD makes use of Virtual 



1 http: / /www.ct .astro.it /fly/ 
2 http://www.cineca.it/astromd 



Reality techniques which are particularly effective for understanding the three 
dimensional distribution of the fields, their geometry, topology and specific pat- 
terns. The display of data gives the illusion of a surrounding medium into which 
the user is immersed. The result is that the user has the impression of travel- 
ling through a computer-based multi-dimensional model which could be directly 
hand-manipulated. 

2. FLY code 

The FLY code, written in Fortran 90 and C languages, uses the one-side commu- 
nication paradigm: it has been developed on the CRAY T3E using the SHMEM 
library. FLY is based on the following main characteristics. It adopts a simple 
domain decomposition, a grouping strategy and a data buffering that allows us 
to minimize data communication. 

2.1. Domain decomposition 

FLY does not split the domain with orthogonal planes, but the domain decom- 
position is done by assigning an equal number of particles to each processor. 
The input data particles is a sorted file containing the fields of position and 
velocity, so that particles with a near tag number are also near in the physical 
space, and the arrays containing the tree properties are distributed using a fine 
grain data distribution. 

2.2. Grouping 

During the tree walk procedure, FLY builds a single interaction list (IL) to be 
applied to all particles inside a grouping cell (C group ). This reduces the number 
of the tree accesses to build the IL. We consider a hypothetical particle we call 
Virtual Body (VB) placed in the center of mass of the C group : the VB interaction 
list ILvb is formed by two parts: 

IL V B = ILfar + IL n ear (1) 

where ILf ar includes the elements more distant than a threshold parameter 
from VB and IL near includes the elements near VB. Using the two lists it is 
possible to compute the force F p of each p particle in C group as the sum of two 
components: 

Fp = F /or ~l" F near (2) 

The component Fj ar is computed only once for VB, and it is applied to all the 
p particles, while the F near component is computed separately for each particle. 
The size of the C group , and the tree-level where it can be considered, is con- 
strained by the maximum allowed value of the overall error of this method. In 
this sense the performance of FLY is a level-based code. 

2.3. Data Buffering 

The data buffer is managed with a policy of a simulated cache in the local 
RAM. Every time the PE has to access a remote element, at first it looks for the 
local simulated cache and, if the element is not found, the PE executes the GET 
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Figure 1. CRAY T3E/1200e. FLY particles/second using 32 and 64 
PEs in uniform and clustered conditions, and scalability from 32 PEs 
to 128 PEs in a 16-million-particles simulation 

calls to down- load the remote element and stores it in the buffer. In a simulation 
with 16-million-particles clustered, with 32 PEs and 256 Mbytes of local memory, 
without the use of the simulated cache, the PEs execute about 2.1 • 10 10 remote 
GETs. This value, using the data buffering, decreases at 1.6 • 10 8 remote GETs, 
with an enormous advantage in terms of scalability and performance. 
Fig. 1 shows the code performance on CRAY T3E system, obtained by running 
simulations with 32 and 64 PEs, and FLY scalability, considering the case of 
16,777,216 particles, where a speed-up factor of 118 is reached using 128 PEs. 
The highest performance obtained in a clustered configuration is a positive effect 
of the grouping characteristic. The obtained results show that FLY has a very 
good scalability and a very high performance and it can be used to run very big 
simulations. 

3. AstroMD package 

AstroMD is developed using the Visualization Toolkit (VTK) by Kitware, a 
freely available software portable on several platforms which range from the PC 
to the most powerful visualization systems, with a good scalability. Data are 
visualized with respect to a box which can describe the whole computational 
mesh or a sub-mesh. AstroMD can find structures having a not well defined 
shape or symmetries, and performs quantitative calculations on a selected re- 
gion or structure. The user can choose the sample of the loaded particle type 
(i.e. stars, dark matter and gas particles), the size of the visualization box and 
the starting time from which he wants to show the evolution of the simulation. 
The Density entries control the visualization of the iso-surfaces. These are cal- 
culated on a grid whose resolution can be user-selectable. To allow the user to 
investigate with more accuracy a subset of the visualized system it is possible 
to use a cubic sampler. If the sampler is selected (Show/Hide menu), visualized 
in the scene and enabled (Sampler menu), all the computations are performed 
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Figure 2. Typical structure of voids and filaments and Iso-surfaces, 
in a sub-region at the end of a simulation. 



only inside the region of the sampler (Fig. 2). AstroMD can also show the 
evolution in time of the simulated system over all the interval of time for which 
the data are available, performing interpolations at intermediate frames. During 
the evolution the updated time of evolution is displayed in the Time entry of the 
Cloud section. Finally snapshots of the images displayed can be created using 
the button Take it in the Screenshot menu. AstroMD is developed by the VISIT 
(Visual Information Technology) laboratory at CINECA (Casalecchio di Reno - 
Bologna) in collaboration with the Astrophysical Observatory of Catania. 
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